53 research outputs found
The MDS Queue: Analysing the Latency Performance of Erasure Codes
In order to scale economically, data centers are increasingly evolving their
data storage methods from the use of simple data replication to the use of more
powerful erasure codes, which provide the same level of reliability as
replication but at a significantly lower storage cost. In particular, it is
well known that Maximum-Distance-Separable (MDS) codes, such as Reed-Solomon
codes, provide the maximum storage efficiency. While the use of codes for
providing improved reliability in archival storage systems, where the data is
less frequently accessed (or so-called "cold data"), is well understood, the
role of codes in the storage of more frequently accessed and active "hot data",
where latency is the key metric, is less clear.
In this paper, we study data storage systems based on MDS codes through the
lens of queueing theory, and term this the "MDS queue." We analytically
characterize the (average) latency performance of MDS queues, for which we
present insightful scheduling policies that form upper and lower bounds to
performance, and are observed to be quite tight. Extensive simulations are also
provided and used to validate our theoretical analysis. We also employ the
framework of the MDS queue to analyse different methods of performing so-called
degraded reads (reading of partial data) in distributed data storage
When Do Redundant Requests Reduce Latency ?
Several systems possess the flexibility to serve requests in more than one
way. For instance, a distributed storage system storing multiple replicas of
the data can serve a request from any of the multiple servers that store the
requested data, or a computational task may be performed in a compute-cluster
by any one of multiple processors. In such systems, the latency of serving the
requests may potentially be reduced by sending "redundant requests": a request
may be sent to more servers than needed, and it is deemed served when the
requisite number of servers complete service. Such a mechanism trades off the
possibility of faster execution of at least one copy of the request with the
increase in the delay due to an increased load on the system. Due to this
tradeoff, it is unclear when redundant requests may actually help. Several
recent works empirically evaluate the latency performance of redundant requests
in diverse settings.
This work aims at an analytical study of the latency performance of redundant
requests, with the primary goals of characterizing under what scenarios sending
redundant requests will help (and under what scenarios they will not help), as
well as designing optimal redundant-requesting policies. We first present a
model that captures the key features of such systems. We show that when service
times are i.i.d. memoryless or "heavier", and when the additional copies of
already-completed jobs can be removed instantly, redundant requests reduce the
average latency. On the other hand, when service times are "lighter" or when
service times are memoryless and removal of jobs is not instantaneous, then not
having any redundancy in the requests is optimal under high loads. Our results
hold for arbitrary arrival processes.Comment: Extended version of paper presented at Allerton Conference 201
The Expressive Power of Low-Rank Adaptation
Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method that
leverages low-rank adaptation of weight matrices, has emerged as a prevalent
technique for fine-tuning pre-trained models such as large language models and
diffusion models. Despite its huge success in practice, the theoretical
underpinnings of LoRA have largely remained unexplored. This paper takes the
first step to bridge this gap by theoretically analyzing the expressive power
of LoRA. We prove that, for fully connected neural networks, LoRA can adapt any
model to accurately represent any smaller target model if
LoRA-rank . We also quantify the approximation error
when LoRA-rank is lower than the threshold. For Transformer networks, we show
any model can be adapted to a target model of the same size with
rank- LoRA adapters.Comment: 40 pages, 5 figure
- …